Collocations Computed from the Web
نویسندگان
چکیده
This paper describes a prototype system implemented for verifying the correctness of all verb-preposition-collocations found in a given text. The verification is done using statistics from the world’s largest corpus the Internet. The tool used for obtaining these statistics is the Google Web APIs service. The probability of correctness is computed according to the concepts of proportional score, t-score and mutual information.
منابع مشابه
How Statistical Information from the Web can Help Identify Named Entities
This paper presents a Natural Language Processing (NLP) approach to filter Named Entities (NE) from a list of collocation candidates. The NE are defined as the names of ’People’, ’Places’, ’Organizations’, ’Software’, ’Illnesses’, and so forth. The proposed method is based on statistical measures associated with Web resources to identify NE. Our method has three stages: (1) Building artificial ...
متن کاملWeb-Based Measurements of Intra-collocational Cohesion in Oxford Collocations Dictionary
Cohesion between components of collocations is already acknowledged measurable by means of the Web, and cohesion measurements are used for some applications and extraction of new collocations. Taking a specific cohesion criterion SCI, we performed massive evaluations of collocate cohesion in Oxford Collocations Dictionary. For three groups of modificative collocations (adjectivenoun, adverbad...
متن کاملFinding domain specific collocations and concordances on the Web
TerminoWeb is a web-based platform designed to find and explore specialized domain knowledge on the Web. An important aspect of this exploration is the discovery of domain-specific collocations on the Web and their presentation in a concordancer to provide contextual information. Such information is valuable to a translator or a language learner presented with a source text containing a specifi...
متن کاملCollocation Extraction Using Web Statistics
This paper mines collocations from two different web usage corpora, NTU proxy log and TTS search log. The precisions for NTU and TTS test data are 61.76% and 57.50%, respectively, by human judgment for 2% sampling of extracted collocations. For automatic evaluation, we submit extracted collocation to Google search engine, and the resulting page counts are used to compute the mutual information ...
متن کاملThe Construction of a Chinese Collocational Knowledge Resource and Its Application for Second Language Acquisition
The appropriate use of collocations is a challenge for second language acquisition. However, high quality and easily accessible Chinese collocation resources are not available for both teachers and students. This paper presents the design and construction of a large scale resource of Chinese collocational knowledge, and a web-based application (OCCA, Online Chinese Collocation Assistant) which ...
متن کامل